BRIEFINGS IN FUNCTIONAL GENOMICS. VOL II. NO I. 38-50
doi:IO.I093/bfgp/elr046
NGS technologies for analyzing
germplasm diversity in gene banks
Benjamin Kilian and Andreas Graner
Advance Access publication date 17 January 2012
Abstract
More than 70 years after the first ex situ genebanks have been established, major efforts in this field are still con-
cerned with issues related to further completion of individual collections and securing of their storage. Attempts
regarding valorization of ex situ collections for plant breeders have been hampered by the limited availability of
phenotypic and genotypic information. With the advent of molecular marker technologies first efforts were made
to fingerprint genebank accessions, albeit on a very small scale and mostly based on inadequate DNA marker sys-
tems. Advances in DNA sequencing technology and the development of high-throughput systems for multiparallel
interrogation of thousands of single nucleotide polymorphisms (SNPs) now provide a suite of technological platforms
facilitating the analysis of several hundred of Gigabases per day using state-of-the-art sequencing technology or,
at the same time, of thousands of SNPs. The present review summarizes recent developments regarding the deploy-
ment of these technologies for the analysis of plant genetic resources, in order to identify patterns of genetic diver-
sity, map quantitative traits and mine novel alleles from the vast amount of genetic resources maintained in
genebanks around the world. It also refers to the various shortcomings and bottlenecks that need to be overcome
to leverage the full potential of high-throughput DNA analysis for the targeted utilization of plant genetic resources.
Keywords: genetic resources; next- generation sequencing; SNP; allele mining; genetic diversity; association analysis
INTRODUCTION
Plant breeding needs to focus on traits with the
greatest potential to increase yield under changing
climate conditions [1]. Agricultural practices have
gradually displaced local traditional varieties and
crop wild relatives, leading to a dramatic loss of in-
digenous biodiversity. Tapping into the rich genetic
diversity inherent in a crop species and their wild
relatives is a prerequisite for germplasm improvement
in the future [2-7; http://www.fao.org]. Hence,
new technologies must be developed to accelerate
breeding through improving genotyping and pheno-
typing methods and by accessing the available gen-
etic diversity stored in genebanks around the world.
Prior to the advent of molecular characterization,
accessions in germplasm collections were mainly
examined based on morphological characters and
phenotypic traits [8] . The development of molecu-
lar techniques now allows a more accurate analysis
of large collections. High-throughput (HT) technol-
ogies including DNA isolation, genotyping, pheno-
typing and next-generation sequencing (NGS)
provide new tools to add substantial value to gene-
bank collections. The integration of genomic
data into genebank documentation systems and its
combination with taxonomic, phenotypic and eco-
logical data will usher in a new era for the valoriza-
tion of plant genetic resources (PGR). From the
Corresponding author. Benjamin Kilian, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Genebank/
Genome Diversity, Corrensstrasse 3, 06466 Gatersleben, Germany. Tel.: +49 (0)39482 5-571; Fax: +49 (0)39482 5-500;
E-mail: kilian@ipk-gatersleben.de
*This article is dedicated to Heiko Parzies, plant geneticist and plant breeder who passed away far too early.
Benjamin Kilian is in the research group Genome Diversity at the IPK. His main interests are in genetic diversity, evolution and
domestication of Triticeae. He is in charge of projects aiming at exploiting natural genetic diversity by whole-genome association
mapping, high-throughput phenotyping and resequencing approaches.
Andreas Graner is managing director of the Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) and the head of the
German Federal ex situ genebank for agricultural and horticultural plants. His research aims at developing genomics based approaches
for the valorization of plant genetics resources of barley (Hordeum vulgare) .
© The Author(s) 2012. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/
by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
NGS technologies for analyzing germplasm diversity in genebanks
39
Hordeum Accessions
Figure I: Ex situ collections are dominated by major crop species. (A) Of the more than 3000 crop species that are
maintained ex situ, 10 species totaling 3 540 000 accessions represent about half of the global inventory of ex situ re-
sources amounting to 7.4 million. (B) Correlation of the aggregated size of the ex situ collections the acreage fetched
by the individual crop species.
determination of phenotypic traits to the application
of NGS to whole genomes, every aspect of genom-
ics will have a great impact not only on PGR con-
servation, but also on their utilization in plant
breeding [9].
Identification and tracking of genetic variation has
become so efficient and precise that thousands of
candidate genes can be tracked within large gene-
bank collections [10]. Using NGS technologies, it is
possible to resequence candidate genes, entire tran-
scriptomes or entire plant genomes more efficiently
and economically than ever before. Advances in
sequencing technology will allow for whole-genome
resequencing of hundreds of individuals. In this way,
information on thousands of candidate genes and
candidate regions can be harnessed for thousands of
individuals to sample genetic diversity within and
between germplasm pools, to map Quantitative
Trait Loci (QTLs), to identify individual genes and
to determine their functional diversity. In this
review, we outline some important developments
in this field, where NGS technologies are expected
to enhance the value and thus the usefulness of gen-
ebank collections.
STATE OF EX SITU GERMPLASM
RESOURCES
PGR include cultivars, landraces, crop wild relatives
and mutants. The loss of genetic diversity in many
crop plants has resulted in efforts to collect PGR
which were initiated by Vavilov early in the 20th
century aiming at supporting plant breeders with
genetic material to extend genetic variability, as a
basis to create new crop varieties [11]. A wealth
of germplasm collections is available worldwide,
with more than 7 million accessions held in over
1.700 genebanks (http://www.fao.org/docrep/013/
il500e/il 500e00.htm). These do not evenly cover
all crop species but are highly biased regarding their
agricultural importance. About 50% of the global ex
situ germplasm is made up by only 10 crop species
with the three largest collections (wheat, rice and
barley) representing 28% of the global germplasm
(Figure 1). Passport and genotypic data suggest that
collections include different degrees of duplications
resulting in ^1.9—2.2 million distinct accessions
with the remaining being duplicates (http://www
.fao.org/docrep/013/il500e/il500e00.htm). Proper
conservation of PGR along with the development
of best genebank practices and pomoting the effect-
ive use is vital for food security in the future [12].
However, ex situ conservation is rather fragmented,
largely because it is mainly based on national pro-
grams and scattered institutional efforts. For instance,
barley (Hordeum vulgare L.), is maintained in more
than 200 collections worldwide amounting to ap-
proximately 470 000 accessions [13]. Other crop spe-
cies follow similar patterns [14]. Despite manifold
efforts to coordinate genebank activities conservation
40
Kilian and Graner
is still inefficient in many places and suffers from
variable or even lacking standards, unreliable access
and poor characterization and documentation of the
material [15]. Ex situ germplasm collections for crop
wild relatives are rather limited in size due to the
difficulties in maintaining non-domesticated plants
[16]. Introgression from wild to cultivated germ-
plasm and vice versa both during seed multiplication
in genebanks as well as in the wild pose a problem
for proper maintenance and correct classification of
the material, which usually is based on few morpho-
logical characters only. Another problem is that gen-
ebank accessions, even if they represent inbreeding
crop species, often are genetically heterogeneous and
may show residual heterozygosity. While this may
reflect the original genetic state, e.g. of a landrace
accession, it seriously can impair its molecular char-
acterization and its subsequent use for research and
breeding. Thus, most core collections are made up of
accessions which underwent purification by single
seed descent (SSD).
Systematic phenotypic analysis of genebank col-
lections is a time and resource intense effort which
has been mainly restricted to agronomic traits
that show a high heritability and can be assessed
based on the per se performance of an accession.
Therefore, most evaluation efforts were focused to
combine i.e. disease resistance and important mor-
phological characters (yield components) [8, 17].
Deep genetic and phenotypic characterization of
genetic resources by HT techniques, including rese-
quencing of enriched candidate genes and low-
coverage full-genome resequencing will increasingly
become available. Concomitantly large amounts of
data need to be integrated within the current docu-
mentation systems. Genebanks have to prepare for
entering the genomics era by developing new strate-
gies and novel information tools to assess the genetic
diversity represented in their collections. Although
there have been some successful examples of extract-
ing useful genes from genebanks, the vast potential of
this resource still remains largely untapped [18, 19].
CHARACTERIZATION OF
GERMPLASM BY MOLECULAR
MARKERS: THE CURRENT STATE
A large series of studies have been undertaken to
study diversity, domestication, evolution and phyl-
ogeny of PGR, largely selected from genebank col-
lections. Early studies considered morphological and
cytogenetic characters. Various other techniques and
molecular markers have been applied subsequently
[20—23]. Until recently, amplified fragment length
polymorphism (AFLP) or simple sequence repeats
(SSR) were the molecular markers of choice for
DNA fingerprinting of crop genomes [24—26].
Owing to their amenability to systematic develop-
ment and HT detection, SNP markers increasingly
applied to study genetic diversity in germplasm col-
lections of up to several hundreds of accessions.
Many of these collections have been established as
association panels for linkage disequilibrium (LD)
mapping, thus providing a first link between pheno-
typic and genotypic data sets. The corresponding
accessions have been selected from various germ-
plasm sources or breeding programs to represent a
rough cross section of the overall genetic diversity
available for a given species or for an ecogeographical
region [27, 28]. This is exemplified by a population
comprising 224 spring barley accessions, which were
selected from the Barley Core Collection, BCC [29]
and complemented by additional accessions to cover
the entire distribution range of this crop [30] . More
recently, aboutl500 spring barley landraces adapted
to temperate climate conditions were selected among
22 093 Hordeum accessions of the Federal ex situ gen-
ebank (IPK Gatersleben, Germany), based on their
origin and morphology. The whole set has been
genotyped by 43 SSR markers and analyzed for its
genetic structure. While this is intended to usher in
large-scale fingerprinting analysis of barley genebank
accessions, the approach still falls short of providing
informed molecular access to the entire collection.
Different marker systems for genetic diversity studies
and population parameters can be compared over a
collection as recently shown by [31] who compared
the performance of 42 SSR markers and 1536 SNP
markers. The marker type of choice and the number
of markers to be studied have to be adjusted for each
species and project.
Allele mining of individual loci
Plant accessions from wild or locally adapted landrace
genepools conserved in genebanks contain a rich
repertoire of alleles that have been left behind
by the selective processes of domestication, selection
and cross-breeding that paved the way to today's
elite cultivars. These resources stored in genebanks
remain underexplored owing to a lack of effi-
cient strategies to screen, isolate and transfer import-
ant alleles. The most effective strategy for
NGS technologies for analyzing germplasm diversity in genebanks
41
determining allelic richness at a given locus is cur-
rently to determine its DNA sequence in a represen-
tative collection of individuals. Large-scale allele
mining projects for germplasm collections at the mo-
lecular level are needed as the one described for Pm3
in wheat. Bhullar etal. [18] first selected a set of 1320
bread wheat landraces from a virtual collection of
16 089 accessions, using the focused identification
of germplasm strategy (FIGS) and isolated seven
new resistance alleles of the powdery mildew resist-
ance gene Pm3. Similarly, a series of novel alleles
have been detected for a recessive gene conferring
virus resistance in barley [32, 33]. Further resequen-
cing studies of candidate genes for agriculturally im-
portant traits have been published, however, from
smaller collections and mostly without functional
characterization [34—40].
Resequencing of candidate genes using Sanger
sequencing has been applied to study phylogenetic
relationships of crop plants, their domestication, evo-
lution, speciation and ecological adaptation. Early
studies resequenced a single locus or few loci in
only few individuals per species [41, 42]. Reduced
costs for Sanger sequencing using capillary instru-
ments and 9 6- well formats facilitated multilocus stu-
dies in larger collections [43—51].
NGS technologies to screen germplasm
collections
Large-scale NGS is now possible using platforms
such as Illumina/GA, Roche/GS FLX, Applied
Biosystems/SOLiD and cPAL sequencing [52, 53].
The declining cost of generating such data is trans-
forming all fields of genetics [54] . Many crop plant
genomes are characterized by the vast abundance of
repetitive DNA. For example, the genome of barley
comprises >5 Gb of DNA sequence of which <2%
can be accounted for by genes [55]. Therefore, to
avoid excessive sequencing of putatively non-in-
formative, repetitive DNAs, reduced-representation
sequencing techniques have been developed to
home in on subset of the genome for sequencing
[56, 57]. When combined with techniques for label-
ing reads (barcoding), DNA from many individuals
can be analyzed in the same pooled sequencing re-
action, and NGS provides an increasingly affordable
means. These technologies are therefore becoming a
standard choice for generating genetic data in fields
such as population genetics, conservation genetics
and molecular ecology. On the other hand, the
deluge of sequence data they will entail the necessity
to develop an appropriate IT infrastructure and new
computational solutions [58-64].
Sequencing many individuals at low depth is an-
other attractive strategy e.g. for complex trait associ-
ation studies as shown by [65]. While detailed
analysis of a single individual typically requires deep
sequencing, resequencing of many individuals allows
drastic reduction of sequencing depth when com-
bined with efficient genotype imputation to match
for missing data. Genotype imputation has been used
widely in the analysis of genome-wide association
studies (GWAS) to boost power and to facilitate
the combination of results across different studies
using meta-analyses [66, 67].
We have not yet reached the point at which rou-
tine whole-genome resequencing of large numbers
of crop plant genomes becomes feasible. Therefore,
it is necessary to select genomic regions of interest
and to enrich these regions before sequencing.
Sequencing targeted regions of DNA (e.g. the
exome or parts thereof) rather than complete gen-
omes will be likely the preferred approach for most
genomics applications including evolutionary biol-
ogy, association mapping and biodiversity conserva-
tion [68]. Sequencing targeted regions on massively
parallel-sequencing instruments requires methods for
concomitant enrichment of the templates to be
sequenced. There are several enrichment approaches
available, each with advantages and disadvantages
[69-72]. Resequencing allows fingerprinting of
many individuals without ascertainment bias which
is inherent to some SNP marker systems [73—75].
As outlined above, targeted resequencing of
hundreds of loci in genebank collections is already
feasible. Yet, the costs for DNA extraction, com-
plexity reduction and barcoding need to be brought
down for systematic resequencing of genebank col-
lections. In this context, large efforts have recently
been made to automate protocols for massively
parallel (re) sequencing and data analysis in order
to match the increasing instrument throughput.
These protocols that include e.g. large-scale auto-
matic library preparation and size selection on
robots [76] or fully automated construction of bar-
coded libraries [77] — might be useful paving the way
for automated NGS technologies to screen genebank
collections [78].
Multiparallel resequencing studies
Triggered by advancements in sequencing technol-
ogies, several crop genome sequences have been
42
Kilian and Graner
produced or are underway [79-82]. Once good
quality levels have been achieved, these sequences
will enable researchers to address all kinds of bio-
logical questions or to link sequence diversity accur-
ately to pheno types.
Rapid developments in NGS will soon make
whole-genome resequencing in several individuals
or targeted resequencing of large germplasm collec-
tions reality. This will help to eliminate an important
difficulty in the estimation of LD and genetic rela-
tionships between accessions obtained in bi-allelic
geno typing studies caused by ascertainment bias i.e.
the presence of rare alleles [73, 83—85].
Based on the available Arabidopsis thaliana (L.)
Heynh. genome sequence, Weigel and Mott [86]
advocated a 1001 Genomes project for Arabidopsis.
Several Arabidopsis lines have been sequenced since
[87, 88]. First studies on whole-genome resequen-
cing in crop species have been published for rice and
maize [66, 89, 90].
Combined genetic approaches for species, where
a complete genome sequence and millions of
SNPs are available, have been performed. Such
approaches that include e.g. large-scale geno typing,
targeted genomic enrichment, whole-genome rese-
quencing and GWAS have been addressed to iden-
tify allelic diversity, rare genetic variation, QTL and
their functional characterization [91-96] or to iden-
tify selective sweeps of favorable alleles and candidate
mutations that have had a prominent role in domes-
tication [97].
TRAIT MAPPING IN PLANTS
Genome-wide marker discovery using
NGS
SNPs are the most abundant form of genetic vari-
ation in eukaryotic genomes and are not a limiting
factor anymore, also not for crop species with large
genome sizes like barley [98]. SNP markers are rap-
idly replacing SSRs or Diversity Arrays Technology
(DArT) [99] markers because they are more abun-
dant, reproducible, amenable to automation and
increasingly cost-effective [100, 101]. SNP-based
resources are presently being developed and made
publicly available for broad application in crop re-
search [102].
A high-quality genomic sequence as it is available
for Arabidopsis and rice represents the ideal blueprint
for resequencing and the identification of SNPs.
But even for species with less complete genomic
sequences such as barley and wheat [103, 104] or
other species [105-109] NGS methods are valuable
for genome-wide marker development, genotyping
and targeted sequencing across the genomes of popu-
lations [110—112]. These new methods — which in-
clude e.g. reduced-representation libraries (RRLs)
[113—115], complexity reduction of polymorphic
sequences (CRoPS) [116, 117], restriction-site-
associated DNA sequencing (RAD-seq) [118] and
low-coverage sequencing for genotyping [119—121]
are applicable for genetic analysis to non-model spe-
cies, to species with high levels of repetitive DNA or
to breeding germplasm with low levels of poly-
morphism — without the need for prior sequence in-
formation. These methods can be applied to
compare SNP diversity within and between closely
related plant species or within wild natural popula-
tions [122, 123].
Genome-wide association studies in crop
plants
The systematic characterization and utilization of
naturally occurring genetic variation has become an
important approach in plant genome research and
plant breeding. So far, linkage mapping based on
bi-parental progenies has proven useful in detecting
major genes and QTLs [124, 125]. Although this
approach has been successful in many analyses, it
suffers from several drawbacks. LD or association
mapping is an attractive alternative to traditional
linkage mapping and has several advantages over
classical linkage mapping i.e. using unstructured
populations that have been subjected to many
recombination events [126-128]. GWAS in diverse
germplasm collections offer new perspectives
towards gene and allele discovery for traits of agri-
cultural importance and dissecting the genetic basis
of complex quantitative traits in plants [129, 130].
However, GWAS require a genome-wide assess-
ment of genetic diversity (preferably based on a ref-
erence genome sequence and resequenced parts
thereof), patterns of population structure, and the
decay of LD. For this, effective genotyping tech-
niques for plants, high-density marker maps, pheno-
typing resources, and if possible, a high-quality
reference genome sequence is required [131]. The
results of GWAS need in many cases confirmation
by linkage analysis.
GWAS have identified a large number of SNPs
associated with disease phenotypes in humans, also
in diverse worldwide populations [132]. Early
NGS technologies for analyzing germplasm diversity in genebanks
43
association mapping studies in crop plants were ham-
pered by the availability of a limited amount of
mapped markers and thus were mainly based on
resequencing candidate genes [39, 40]. The develop-
ment of comprehensive sets of SNP markers that can
be interrogated in highly multiparallel HT SNP gen-
otyping ushered in the era of germplasm diversity
studies and GWAS in crop plants. [87, 98, 119,
133-138].
For barley, few germplasm collections including
wild and landrace barley have been genotyped using
custom-made OPAs (oligo-pool assays) by Illumina
GoldenGate technology [139, 140]. SNP markers
significantly associated with traits are being used to
identify genomic regions that harbor candidate
genes for these traits in various collaborative barley
projects. It is relatively easy to detect marker- trait
associations in barley cultivar populations that
have extensive LD (5— lOcM). Conversely, popula-
tions with low LD are supposed to provide
high-resolution associations (landraces, <5 cM; wild
barley, <1 cM) but the number of markers needed to
find significant associations is relatively high. This
rapid decay in LD in populations of wild germplasm
is a key generic problem with genotyping for
bi-allelic SNPs. Furthermore, ascertainment bias of
bi-allelic SNP discovery i.e. caused by rare alleles and
alleles not present in the elite cultivars complicates
the situation in landraces and wild germplasm [73,
141]. Thus rare alleles are usually excluded from ana-
lysis. Higher marker coverage is required in order to
identify candidate genes more efficiently in diverse
collections. In case of barley, a high density SNP
Chip has been developed, which contains 7864
bi-allelic SNPs coming from NGS of a broad range
of barley cultivars (R. Waugh et al, unpublished
data). Such customized arrays for HT SNP genotyp-
ing can accelerate genetic gain in breeding programs.
First barley association panels have been genotyped
using this resource (Figure 2). Similar SNP chips are
0.1
Morex
Winter barleys
Figure 2: NeighborNet [I66] of Hamming distances for 6885 polymorphic SNPs among 27I barley cultivars using
the 9K Infinium iSELECT HD custom genotyping Bead Chip. Barley cultivars Barke, Bowman and Morex are high-
lighted as reference genotypes. Winter barleys form a cluster, which separates them clearly from the remaining
spring barley accessions.
44
Kilian and Graner
becoming available for an increasing number of crop
plants [142, 143]. Combined studies using GWA
mapping, comparative analysis, linkage mapping,
resequencing and functional characterization of can-
didate genes already enabled the identification of
candidate genes for selected traits [66, 91, 128].
While genotyping arrays are useful for assessing
population structure and the decay of LD across
large numbers of samples, low-coverage whole-
genome sequencing will become the genotyping
method of choice for GWAS in plant species [66] .
As for humans, GWAS for plants will become the
primary approach for identifying haplotypes and
genes with common alleles influencing complex
traits. However, common variations identified by
GWAS account for only a small fraction of trait her-
itability and are unlikely to explain the majority of
phenotypic variations of common traits. A potential
source of the missing heritability is the contribution
of rare alleles, insertion— deletion polymorphisms,
copy number variants and epigenetic differences —
that can be detected by NGS technologies. However,
testing the association of rare variants with pheno-
types of interest is challenging. Novel powerful asso-
ciation methods designed for large-scale resequencing
data have to be developed [144-149].
In the future, it can be expected that mapping
by sequencing will become the method of choice
to discover the genes underlying quantitative trait
variation in large purified germplasm collections
[150-152] or epigenetic variation [84, 88, 153-155].
OUTLOOK
PGR of crop wild relatives or locally adapted crop
landraces contain a rich repertoire of alleles that have
been lost by selective processes that generated our
today's elite cultivars. Such alleles represent an
invaluable asset to cope with future challenges for
sustainable agricultural development and food pro-
duction [156, 157]. In the medium run, draft
genome sequences will be available for all major
and many neglected crops species and resequencing
of these genomes in germplasm collections will yield
a wealth of information. Transforming this deluge of
data to information and knowledge will increase our
understanding in all fields of genetics including evo-
lution, ecology, domestication and breeding. Now is
a crucial time to explore the potential implications of
this information revolution for genebanks and to
recognize opportunities and limitations in applying
NGS tools and HT technologies to genebank col-
lections [56, 158].
Sequence informed conservation and
utilization of PGR
The availability of sequence information can make a
significant contribution to the conservation of PGR.
The high degree of redundancy found between dif-
ferent ex situ collections wastes a prohibitive amount
of resources (see above). Across the board, two-third
of the seed multiplication that is the most resource
intense step of all conservation efforts, could be made
redundant, if there were ways to unambiguously
identify duplicates. Most attempts to identify dupli-
cated samples suffered from the difficulty to agree on
a common set of markers for a given species, mani-
fold problems to reproduce DNA marker data be-
tween different labs. DNA sequences do not suffer
from such shortcomings and therefore represent an
ideal information platform to tackle the issue of re-
dundancy. Arguably, sequencing of ex situ collections
just for the sake of eliminating redundancy would be
too expensive an undertaking. Combination of this
effort with one of the issues mentioned below could
provide an added value.
Clearly large crop collections cannot be sequenced
in one draft. Against the backdrop of the evolving
technology, a stepwise approach should be envisaged.
Glaszmann et al [19] suggested the development of
'core reference sets' for our crops. A core reference
set (CRS) is to be understood as 'a set of genetic
stocks that are representative of the genetic resources
of the crop and are used by the scientific community
as a reference for an integrated characterization of its
biological diversity'. Every CRS will serve as a public,
standardized and well characterized resource for the
scientific community. Well characterized, multiplied,
isolated CRS have to be maintained for reference
purposes, comparative studies, future reanalysis and
integrative genomic analysis [59].
For this, already existing core collections must
be transformed into genetic stocks, purified (homo-
geneous/stabilized) and taxonomically classified to
facilitate practical choices for comparative associ-
ation studies. One other approach is to select di-
verse accessions directly from genebank collections
based on all available pre-existing characterization
and evaluation data (C&E), pedigree, origin and
collection site information. Survey genotyping to
test the purity of accessions can be done with vari-
ous molecular marker types such as inter-simple
NGS technologies for analyzing germplasm diversity in genebanks
45
Ex situ collections
Core Reference Sets/Core Collections
r
Genotyping/Sequencing
Phenotypic analysis
Trait mapping
Gene isolation
Allele mining
J
Plant Breeding
Figure 3: DNA genotyping and sequencing as integral components for conservation and valorization of plant gen-
etic resources.
sequence repeats (ISSRs) or AFLPs. Mixed acces-
sions including more than one genotype have to
be advanced by SSD before entering into system-
atic molecular and phenotypic characterization
(Figure 3).
The scope of a genebank may be extended to that
of a DNA bank, similar to biobanks devoted to target
medical research [159]. The various implications of
DNA banks for PGR have been discussed elsewhere.
Common standards and Biobank Information
Management Systems (BIMSs) have to be developed
to deal with highly complex and diverse sets of meta-
data. Advanced technologies for high-quality bio-
sample storage and management systems are
available and have to be implemented [160, 161].
Precise phenotyping is one of the major bottle-
necks in characterizing large collections. New,
non-invasive, automated image analysis technologies
are currently under development for systematic phe-
notyping under greenhouse and field conditions
using novel sensing and imaging technologies.
Phenomics is an emerging field, in which large and
complex data sets are being produced. These require
long-term storage for future reanalysis when software
tools and algorithms have improved or for compara-
tive analysis [162, 163]. Pre-selection of contrasting
accessions by different strategies including allele
mining approaches, genotyping using custom-made
Bead Chips and morphological characterization are
effective strategies to reduce the number of
accessions prior to thorough phenotyping, the
latter being the most time consuming step.
The ultimate goal regarding the valorization of
PGR will be the deployment of novel alleles that
will improve the trait under consideration. While
resequencing of candidate genes is a straightforward
approach to identify allelic variation, deployment of
novel alleles in a breeding program is contingent on
prior phenotypic validation. So far, this has been re-
stricted to major genes, e.g. for disease resistance and
seed quality. Validation of alleles of candidate genes
for quantitative traits still remains a major challenge
(i.e. Targeting Induced Local Lesions in Genomes
(TILLING)), [164, 165]. In this regard, the ability
to replace alleles by site specific recombination
could spur the targeted utilization of PGR and
thus greatly enhance the value chain of Biodiversity.
Key Points
• Novel statistical approaches and promising NGS approaches are
becoming available to screen major genebank collections. NGS
will provide a platform for the large-scale development of SNPs
that can be assayed in highly parallel manner for HT genotyping.
• Alternatively to SNP analysis genotyping by sequencing will be
employed to obtain information on SNP and haplotype patterns.
• A staggered strategy starting from core collections is proposed
to genotype and/or resequences genetic resources.
• Leverage of the full potential of sequence information on PGR
depends on the availability of accurate phenotypic information
and the potential to validate novel alleles at the phenotypic level.
46
Kilian and Graner
FUNDING
This work has been funded by Leibniz Institute of
Plant Genetics and Crop Plant Research (IPK).
References
1. Long SP, Ort DR. More than taking the heat: crops and
global change. Curr Opin Pi Biol 2010;13:240-7.
2. Tanksley SD, McCouch SR. Seed banks and molecular
maps: unlocking genetic potential from the wild. Science
1997;277:1063-6.
3. Zamir D. Improving plant breeding with exotic genetic
libraries. Nat Rev Genet 2001;2:983-9.
4. Hoisington D, Khairallah M, Reeves T, etal. Plant genetic
resources: what can they contribute toward increased crop
productivity? Proc Natl Acad Sci USA 1999;96:5937-43.
5. Fernie AR, Tadmor Y, Zamir D. Natural genetic variation
for improving crop quality. Curr Opin Pi Biol 2006;9:
196-202.
6. Takeda S, Matsuoka M. Genetic approaches to crop im-
provement: responding to environmental and population
changes. Nat Rev Genet 2008;9:444-57.
7. Tester M, Langridge P. Breeding technologies to increase
crop production in a changing world. Science 2010;327:
818-22.
8. Boerner A, Freytag U, Sperling U. Analysis of wheat disease
resistance data originating from screenings of Gatersleben
genebank accessions during 1933 and 1992. Genet Resour
CropEvol 2006;53:453-65.
9. Van K, Kim DH, Shin JH, etal. Genomics of plant genetic
resources: past, present and future. Pi Genet Res 201 1;9:
155-8.
10. Varshney RK, Nayak SN, May GD, etal Next-generation
sequencing technologies and their implications for crop
genetics and breeding. Trend Biotechnol 2009;27:522-30.
11. Loskutov IG. Vavilov and his Institute. A history of the world
collection of plant genetic resources in Russia. Rome, Italy:
International Plant Genetic Resources Institute, 1999.
12. Van Heerwaarden J, van Eeuwijk FA, Ross-Ibarra J.
Genetic diversity in a crop metapopulation. Heredity 2009;
104:28-39.
13. Kniipffer H. Triticeae genetic resources in ex situ genebank
collections. In: Muehlbauer GJ, Feuillet C (eds). Genetics and
Genomics of theTriticeae. Springer, 2009:31-79.
14. Leung H, Hettel GP, Cantrell RP. International Rice
Research Institute: roles and challenges as we enter the
genomics era. Trend Pi Sci 2002;7:139-42.
15. Khoury C, Laliberte B, Guarino L. Trends in ex situ con-
servation of plant genetic resources: a review of global crop
and regional conservation strategies. Genet Res Crop Evol
2010;57:625-39.
16. Kilian B, Ozkan H, Shaaf S, etal. Comparing genetic diver-
sity within a crop and its wild progenitor: a case study for
barley. In: Maxted N, Dulloo ME, Ford-Lloyd BV, et al
(eds). Agrobio diversity Conservation: Securing the Diversity of
Crop Wild Relatives and Landraces. CABI, 2011.
17. Perovic D, Przulj N, Milovanovic M, etal. Characterisation
of spring barley genetic resources in Yugoslavia. In:
Kniipffer H, Ochsmann J (eds). Rudolf Mansf eld and Plant
Genetic Resources. Proceedings of the symposium dedicated to the
100th birthday of Rudolf Mansfeld, Gatersleben, Germany, 8-9
October, Vol. 22. Schriften zu Genetischen Ressourcen,
2001:301-6.
18. Bhullar NK, Street K, Mackay M, etal. Unlocking wheat
genetic resources for the molecular identification of previ-
ously undescribed functional alleles at the Pm3 resistance
locus. Proc Natl Acad Sci USA 2009;106:9519-24.
19. Glaszmann JC, Kilian B, Upadhyaya HD, et al. Accessing
genetic diversity for crop improvement. Curr Opin Pi Biol
2010;13:167-73.
20. Kovach MJ, McCouch SR. Leveraging natural diversity:
back through the bottleneck. Curr Opin PI Biol 2008;11:
193-200.
21. Feuillet C, Muehlbauer GJ. Genetics and genomics of the
triticeae. In: Feuillet C, Muehlbauer GJ (eds). Plant Genetics
and Genomics: Crops and Models, Vol. 7. Springer, 2009.
22. Sang T. Genes and mutations underlying domestication
transitions in grasses. Pi Physiol 2009;149:63-70.
23. Kilian B, Mammen K, Millet E, etal. Aegilops L. In: Kole C
(ed). Wild Crop Relatives: Genetic and Breeding Resources. Springer,
2011.
24. Heun M, Schaefer-Pregl R, Klawan D, etal. Site of einkorn
wheat domestication identified by DNA fingerprinting.
Science 1997;278:1312-4.
25. Castillo A, Dorado G, Feuillet C, etal. Genetic structure and
ecogeographical adaptation in wild barley (Hordeum chilense
Roemer et Schultes) as revealed by microsatellite markers.
BMC PI Biol 2010;10:266.
26. Allender C, King G. Origins of the amphiploid species
Brassica napus L. investigated by chloroplast and nuclear mo-
lecular markers. BMC Pi Biol 2010;10:54.
27. McMullen MD, Kresovich S, Villeda HS, et al. Genetic
properties of the maize nested association mapping popula-
tion. Science 2009;325:737-40.
28. Chao S, Dubcovsky J, Dvorak J, et al. Population- and
genome-specific patterns of linkage disequilibrium and SNP
variation in spring and winter wheat {Triticum aestivum L.).
BMCGenome 2010;11:727.
29. Kniipffer H, van Hintum Th. Summarised diversity - the
Barley Core Collection. In: Bothmer R, Kniipffer H, van
Hintum T, Sato K (eds). Diversity in barley. Hordeum vul-
gare: Elsevier Science, 2003:259—367.
30. Haseneyer G, Stracke S, Paul C, et al. Population struc-
ture and phenotypic variation of a spring barley world
collection set up for association studies. Pi Breed 2010;129:
271-9.
31. Hiibner S, Giinther T, Flavell A, etal. Islands and streams:
clusters and gene flow in wild barley populations from the
Levant. MolEcol. (in press).
32. Stein N, Perovic D, Kumlehn J, etal. The eukaryotic trans-
lation initiation factor 4E confers multiallelic recessive
Bymovirus resistance in Hordeum vulgare (L.). Plant] 2005;42:
912-22.
33. Hofinger BJ, Russell JR, Bass CG, et al. An exceptionally
high nucleotide and haplotype diversity and a signature of
positive selection for the eIF4E resistance gene in barley are
revealed by allele mining and phylogenetic analyses of nat-
ural populations. MolEcol 2011;20:3653-68.
34. Saitoh K, Onishi K, Mikami I, etal. Allelic diversification at
the C (OsCl) locus of wild and cultivated rice. Genetics
2004;168:997-1007.
NGS technologies for analyzing germplasm diversity in genebanks
47
35. Kilian B, Ozkan H, Deusch O, etal. Independent wheat B
and G genome origins in outcrossing Aegilops progenitor
haplotypes. MolBiolEvol 2007;24:217-27.
36. Zhu Q, Zheng X, Luo J, et al. Multilocus analysis of nu-
cleotide variation of Oryzasativa and its wild relatives: severe
bottleneck during domestication of rice. MolBiolEvol 2007;
24:875-88.
37. Jones H, Leigh FJ, Mackay I, et al. Population-based rese-
quencing reveals that the flowering time adaptation of
cultivated barley originated east of the Fertile Crescent.
MolBiolEvol 2008;25:2211-9.
38. Kovach MJ, Calingacion MN, Fitzgerald MA, et al. The
origin and evolution of fragrance in rice (Oryzasativa L.).
Proc Natl Acad Sci USA 2009;106:14444-9.
39. Stracke S, Haseneyer G, Veyrieras JB, et al. Association
mapping reveals gene action and interactions in the deter-
mination of flowering time in barley. Theor Appl Genet
2009;118:259-73.
40. Haseneyer G, Stracke S, Piepho HP, et al. DNA poly-
morphisms and haplotype patterns of transcription factors
involved in barley endosperm development are associated
with key agronomic traits. BMC Pi Biol 2010;10:5.
41. Kellog EA, Appels R, Mason-Gamer AJ. When genes
tell different stories: the diploid genera of Triticeae
(Gramineae). SystBot 1996;21:321-47.
42. Lin JZ, Brown AHD, Clegg MT. Heterogeneous geo-
graphic patterns of nucleotide sequence diversity between
two alcohol dehydrogenase genes in wild barley (Hordeum
vulgare subspecies spontaneum). Proc Natl Acad Sci USA 2001;
98:531-6.
43. Vaughan DA, Morishima H, Kadowaki K. Diversity in the
Oryza genus. Curr Opin Pi Biol 2003;6:139-46.
44. Wright SI, Bi IV, Schroeder SG, etal. The effects of artifi-
cial selection on the maize genome. Science 2005;308:
1310-4.
45. Hyten DL, Song Q, Zhu Y, etal. Impacts of genetic bottle-
necks on soybean genome diversity. Proc Natl Acad Sci USA
2006;103:16666-71.
46. Haudry A, Cenci A, Ravel C, et al. Grinding up wheat:
a massive loss of nucleotide diversity since domestication.
MolBiolEvol 2007;24:1506-17.
47. Kilian B, Ozkan H, Walther A, etal. Molecular diversity at
18 Loci in 321 wild and 92 domesticate lines reveal no
reduction of nucleotide diversity during Triticum monococcum
(einkorn) domestication: implications for the origin of agri-
culture. MolBiolEvol 2007;24:2657-68.
48. Izawa T, Konishi S, Shomura A, etal. DNA changes tell us
about rice domestication. Curr Opin Pi Biol 2009;12:185-92.
49. Labate JA, Robertson LD, Baldo AM. Multilocus sequence
data reveal extensive departures from equilibrium in domes-
ticated tomato (Solanum lycopersicum L.). Heredity 2009;103:
257-67.
50. Tian F, Stevens NM, Buckler ES. Tracking footprints of
maize domestication and evidence for a massive selective
sweep on chromosome 10. Proc Natl Acad Sci USA 2009;
106:9979-86.
51. Escobar J, Scornavacca C, Cenci A, etal. Multigenic phyl-
ogeny and analysis of tree incongruences in Triticeae
(Poaceae). BMC Evol Biol 2011;11:181.
52. Shendure J, Ji H. Next-generation DNA sequencing.
Nat Biotech 2008;26:1135-45.
53. Metzker ML. Sequencing technologies - the next gener-
ation. Nat Rev Genet 2010;11:31-46.
54. Lister R, Gregory BD, Ecker JR. Next is now: new tech-
nologies for sequencing of genomes, transcriptomes, and
beyond. Curr Opin Pi Biol 2009;12:107-18.
55. Wicker T, Taudien S, Houben A, et al. A whole-genome
snapshot of 454 sequences exposes the composition of
the barley genome and provides evidence for parallel evo-
lution of genome size in wheat and barley. Plant] 2009;59:
712-22.
56. Paterson AH. Leafing through the genomes of our major
crop plants: strategies for capturing unique information.
Nat Rev Genet 2006;7:174-84.
57. Baird NA, Etter PD, Atwood TS, etal. Rapid SNP discov-
ery and genetic mapping using sequenced RAD markers.
PEoS ONE 2008;3:e3376.
58. Alexander RP, Fang G, Rozowsky J, et al. Annotating
non-coding regions of the genome. Nat Rev Genet 2010;
11:559-71.
59. Hawkins RD, Hon GC, Ren B. Next-generation gen-
omics: an integrative approach. Nat Rev Genet 2010;11:
476-86.
60. McKenna A, Hanna M, Banks E, et al. The Genome
Analysis Toolkit: a MapReduce framework for analyzing
next-generation DNA sequencing data. Genome Res 2010;
20:1297-303.
61. Schadt EE, Linderman MD, Sorenson J, et al.
Computational solutions to large-scale data management
and analysis. Nat Rev Genet 2010;11:647-57.
62. Surget-Groba Y, Montoya-Burgos JI. Optimization of de
novo transcriptome assembly from next-generation sequen-
cing data. Genome Res 2010;20:1432-40.
63. Nielsen R, Paul JS, Albrechtsen A, etal. Genotype and SNP
calling from next-generation sequencing data. Nat Rev
Genet 2011;12:443-51.
64. Zhang W, Chen J, Yang Y, et al. A practical comparison
of De Novo genome assembly software tools for next-
generation sequencing technologies. PLoS ONE 201 1;6:
el7915.
65. Li Y, Sidore C, Kang HM, etal. Low-coverage sequencing:
implications for design of complex trait association studies.
Genome Res 2011;21:940-51.
66. Huang X, Wei X, Sang T, etal. Genome-wide association
studies of 14 agronomic traits in rice landraces. Nat Genet
2010;42:961-7.
67. Marchini J, Howie B. Genotype imputation for
genome-wide association studies. Nat Rev Genet 2010;11:
499-511.
68. Kirkness EF. Targeted sequencing with microfluidics.
Nat Biotech 2009;27:998-9.
69. Tewhey R, Nakano M, Wang X, et al. Enrichment of
sequencing targets from the human genome by solution
hybridization. Genome Biol 2009;10:R116.
70. Gnirke A, Melnikov A, Maguire J, et al. Solution hybrid
selection with ultra-long oligonucleotides for massively par-
allel targeted sequencing. Nat Biotech 2009;27:182-9.
71. Mamanova L, Coffey AJ, Scott CE, etal. Target- enrichment
strategies for next-generation sequencing. Nat Meth 2010;7:
111-8.
72. Teer JK, Bonnycastle LL, Chines PS, et al. Systematic
comparison of three genomic enrichment methods for
48
Kilian and Graner
massively parallel DNA sequencing. Genome Res 2010;20:
1420-31.
73. Moragues M, Comadran J, Waugh R, etal. Effects of ascer-
tainment bias and marker number on estimations of barley
diversity from high-throughput SNP genotype data. Theor
Appl Genet 2010;120:1525-34.
74. Cosart T, Beja-Pereira A, Chen S, etal. Exome-wide DNA
capture and next generation sequencing in domestic and
wild species. BMC Genome 2011;12:347.
75. Schuenemann VJ, Bos K, DeWitte S, etal. Targeted enrich-
ment of ancient pathogens yielding the pPCPl plasmid of
Yersinia pestis from victims of the Black Death. Proc Natl Acad
Sci USA 2011;108:E746-52.
76. Borgstrom E, Lundin S, Lundeberg J. Large scale library
generation for high throughput sequencing. PLoS ONE
2011;6:el9119.
77. Lennon N, Lintner R, Anderson S, et al. A scalable, fully
automated process for construction of sequence-ready bar-
coded libraries for 454. Genome Biol 2010;11:R15.
78. Zheng J, Moorhead M, Weng L, et al. High-throughput,
high-accuracy array-based resequencing. Proc Natl Acad Sci
USA 2009;106:6712-7.
79. Feuillet C, Leach JE, Rogers J, etal. Crop genome sequen-
cing: lessons and rationales. Trend PI Sci 2011;16:77-88.
80. Schmutz J, Cannon SB, Schlueter J, et al. Genome se-
quence of the palaeopolyploid soybean. Nature 2010;463:
178-83.
81. Young ND, Debelle F, Oldroyd GED, etal. The Medicago
genome provides insight into the evolution of rhizobial
symbioses. Nature 2011;480:520-24.
82. Varshney RK, Chen W, Li Y, etal. Draft genome sequence
of pigeonpea (Cajanus cajan), an orphan legume crop of
resource-poor farmers. Nat Biotech 2011, doi:10.1038/
nbt.2022.
83. Li R, Li Y, Fang X, etal. SNP detection for massively par-
allel whole-genome resequencing. Genome Res 2009 ;19:
1124-32.
84. Rafalski JA. Genomic tools for the analysis of genetic
diversity. Pi Genet Res 2011;9:159-62.
85. Wang L, Li P, Brutnell TP. Exploring plant transcriptomes
using ultra high-throughput sequencing. Brief Fund Genome
2010;9:118-28.
86. Weigel D, Mott R. The 1001 Genomes Project for
Arabidopsis thaliana. Genome Biol 2009;10:107.
87. Cao J, Schneeberger K, Ossowski S, et al. Whole-genome
sequencing of multiple Arabidopsis thaliana populations.
Nat Genet 2011;43:956-63.
88. Lister R, Ecker JR. Finding the fifth base: genome-wide
sequencing of cytosine methylation. Genome Res 2009 ;19:
959-66.
89. Lai J, Li R, Xu X, etal. Genome-wide patterns of genetic
variation among elite maize inbred lines. Nat Genet 2010;42:
1027-30.
90. He Z, Zhai W, Wen H, etal. Two evolutionary histories in
the genome of rice: the roles of domestication genes. PLoS
Genet 2011;7:el002100.
91. Ramsay L, Comadran J, Druka A, et al.
INTERMEDIUM- C, a modifier of lateral spikelet fertility
in barley, is an ortholog of the maize domestication
gene TEOSINTE BRANCHED 1. Nat Genet 2011;43:
169-72.
92. Muir WM, Wong GK-S, Zhang Y, et al. Genome-wide
assessment of worldwide chicken SNP genetic diversity in-
dicates significant absence of rare alleles in commercial
breeds. Proc Natl Acad Sci USA 2008;105:17312-7.
93. Huang X, Qian Q, Liu Z, et al. Natural variation at the
DEPi locus enhances grain yield in rice. Nat Genet 2009;
41:494-7.
94. Todesco M, Balasubramanian S, Hu TT, etal. Natural allelic
variation underlying a major fitness trade-off in Arabidopsis
thaliana. Nature 2010;465:632-6.
95. Mokry M, Nijman I, van Dijken A, etal. Identification of
factors required for meristem function in Arabidopsis using a
novel next generation sequencing fast forward genetics ap-
proach. BMC Genome 2011;12:256.
96. Yan J, Kandianis CB, Harjes CE, etal. Rare genetic variation
at Zea mays crtRBl increases ^-carotene in maize grain. Nat
Genet 2010;42:322-7.
97. Rubin CJ, Zody MC, Eriksson J, et al. Whole-genome
resequencing reveals loci under selection during chicken
domestication. Nature 2010;464:587-91.
98. Close T, Bhat P, Lonardi S, etal. Development and imple-
mentation of high-throughput SNP genotyping in barley.
BMC Genome 2009;10:582.
99. Wenzl P, Li H, Carling J, et al. A high-density consensus
map of barley linking DArT markers to SSR, RFLP
and STS loci and agricultural traits. BMC Genome 2006;
7:206.
100. Ganal MW, Altmann T, Roeder MS. SNP identification in
crop plants. Curr Opin Pi Biol 2009;12:211-7.
101. Alsop B, Farre A, Wenzl P, et al. Development of wild
barley-derived DArT markers and their integration into a
barley consensus map. Mol Breed 2011;27:77-92.
102. McCouch SR, Zhao K, Wright M, etal. Development of
genome-wide SNP assays for rice. Breed Sci 2010;60:
524-35.
103. Paux E, Sourdille P, Salse J, et al. A physical map of the
1-Gigabase bread wheat chromosome 3B. Science 2008;322:
101-4.
104. Mayer KFX, Martis M, Hedley PE, etal. Unlocking the
barley genome by chromosomal and comparative genomics.
Plant Cell 2011;23:1249-63.
105. Kuelheim C, Hui Yeoh S, Maintz J, etal. Comparative SNP
diversity among four Eucalyptus species for genes from sec-
ondary metabolite biosynthetic pathways. BMC Genome
2009;10:452.
106. Hribova E, Neumann P, Matsumoto T, etal. Repetitive
part of the banana (Musa acuminata) genome investigated
by low-depth 454 sequencing. BMCPlantBiol 2010;10:204.
107. Griffin P, Robin C, Hoffmann A. A next-generation
sequencing method for overcoming the multiple gene
copy problem in polyploid phylogenetics, applied to Poa
grasses. BMC Biol 2011;9:19.
108. Potato Genome Sequencing Consortium. Genome se-
quence and analysis of the tuber crop potato. Nature 2011;
475:189-95.
109.Shulaev V, Sargent DJ, Crowhurst RN, etal. The genome
of woodland strawberry {Fragaria vesca) . Nat Genet 2011;43:
109-16.
llO.Bansal V, Harismendy O, Tewhey R, etal. Accurate detec-
tion and genotyping of SNPs utilizing population sequen-
cing data. Genome Res 2010;20:537-45.
NGS technologies for analyzing germplasm diversity in genebanks
49
111. Davey JW, Hohenlohe PA, Etter PD, etal. Genome-wide
genetic marker discovery and genotyping using next-
generation sequencing. Nat Rev Genet 2011;12:499-510.
112. Luca F, Hudson RR, Witonsky DB, et al. A reduced
representation approach to population genetic analyses and
applications to human evolution. Genome Res 2011;
doi:10.1101/gr.H9792.110.
113. Hyten D, Cannon S, Song Q, et al. High-throughput
SNP discovery through deep resequencing of a
reduced representation library to anchor and orient scaffolds
in the soybean whole genome sequence. BMC Genome
2010;11:38.
114. You F, Huo N, Deal K, etal. Annotation-based genome-
wide SNP discovery in the large and complex Aegilops
tauschii genome using next-generation sequencing without
a reference genome sequence. BMC Genome 2011;12:59.
115. Gompert Z, Forister ML, Fordyce JA, etal. Bayesian analysis
of molecular variance in pyrosequences quantifies popula-
tion genetic structure across the genome of Lycaeides but-
terflies. MolEcol 2010;19:2455-73.
116. van Orsouw NJ, Hogers RCJ, Janssen A, etal. Complexity
Reduction of Polymorphic Sequences (CRoPS™): a novel
approach for large-scale polymorphism discovery in com-
plex genomes. PLoS ONE 2007;2:ell72.
117. Mammadov J, Chen W, Ren R, et al. Development of
highly polymorphic SNP markers from the complexity
reduced portion of maize [Zea mays L.] genome for use in
marker-assisted breeding. TheorAppl Gen 2010;121:577-88.
118. Baxter SW, DaveyJW, Johnston JS, etal. Linkage mapping
and comparative genomics using next-generation BJVD
sequencing of a non-model organism. PLoS ONE 201 1;6:
el9315.
119. Huang X, Feng Q, Qian Q, etal. Weng Q, Huang T, Dong
G, Sang T, Han B: High-throughput genotyping by
whole-genome resequencing. Genome Res 2009;19:1068-76.
120. Andolfatto P, Davison D, Erezyilmaz D, etal. Multiplexed
shotgun genotyping for rapid and efficient genetic mapping.
GenomeRes 2011;21:610-7.
121. Elshire RJ, Glaubitz JC, Sun Q, et al. A robust, simple
Genotyping-by-Sequencing (GBS) approach for high diver-
sity species. PLoS ONE 2011;6:el9379.
122,Ossowski S, Schneeberger K, Lucas-Lledo JI, etal. The rate
and molecular spectrum of spontaneous mutations in
Arabidopsis thaliana. Science 2010;327:92-4.
123. Pool JE, Hellmann I, Jensen JD, etal. Population genetic
inference from genomic sequence variation. Genome Res
2010;20:291-300.
124. Frary A, Nesbitt TC, Frary A, et al. fw2.2: A quantitative
trait locus key to the evolution of tomato fruit size. Science
2000;289:85-8.
125. Komatsuda T, Pourkeirandish M, He C, etal. Six-rowed
barley originated from a mutation in a homeodomain-
leucine zipper I— class homeobox gene. Proc Natl Acad Sci
USA 2007;104:1424-9.
126,Oraguzie NC, Rikkerink EHA, Gardiner SE, De Silva HN
(eds). Association mapping in plants. Springer, 2007.
127. Waugh R, JanninkJL, Muehlbauer GJ, etal. The emergence
of whole genome association scans in barley. Curr Opin Pi
Biol 2009;12:218-22.
128. Atwell S, Huang YS, Vilhjalmsson BJ, etal. Genome-wide
association study of 107 phenotypes in Arabidopsis thaliana
inbred lines. Nature 2010;465:627-31.
129. Mackay TFC, Stone EA, Ayroles JF. The genetics of quan-
titative traits: challenges and prospects. Nat Rev Genet 2009;
10:565-77.
130. Hall D, Tegstrom C, Ingvarsson PK. Using association map-
ping to dissect the genetic basis of complex traits in plants.
Brief Fund Genomic 2010;9:157-65.
131. Rafalski JA. Association genetics in crop improvement. Curr
OpinPlBiol 2010;13:174-80.
132. Rosenberg NA, Huang L, Jewett EM, etal. Genome-wide
association studies in diverse populations. Nat Rev Genet
2010;11:356-66.
133. Yan J, Shah T, Warburton ML, et al. Genetic character-
ization and linkage disequilibrium estimation of a global
maize collection using SNP markers. PLoS ONE 2009;4:
e8451.
134. Deulvot C, Charrel H, Marty A, etal. Highly-multiplexed
SNP genotyping for genetic mapping and germplasm diver-
sity studies in pea. BMC Genome 2010;11:468.
135. Myles S, Chia JM, Hurwitz B, etal. Rapid genomic char-
acterization of the genus Vitis. PLoS ONE 2010;5:e8219.
136. Grattapaglia D, Silva-Junior O, Kirst M, et al.
High-throughput SNP genotyping in the highly heterozy-
gous genome of Eucalyptus: assay success, polymorphism
and transferability across species. BMC PI Biol 2011;11:65.
137. Myles S, Boyko AR, Owens CL, etal. Genetic structure and
domestication history of the grape. Proc Natl Acad Sci USA
2011;108:3530-5.
138. Tian F, Bradbury PJ, Brown PJ, etal. Genome- wide asso-
ciation study of leaf architecture in the maize nested asso-
ciation mapping population. Nat Genet 2011;43:159-62.
139. Russell J, Dawson IK, Flavell AJ, etal. Analysis of >1000
single nucleotide polymorphisms in geographically matched
samples of landrace and wild barley indicates secondary con-
tact and chromosome-level differences in diversity around
domestication genes. New Phytol 2011;191:564-78.
140. Comadran J, Russell JR, Booth A, etal. Mixed model as-
sociation scans of multi-environmental trial data reveal
major loci controlling yield and yield related traits in
Hordeum vulgare in Mediterranean environments. TheorAppl
Genet 2011;122:1363-73.
141. Abdurakhmonov IY, Abdukarimov A. Application of asso-
ciation mapping to understanding the genetic diversity of
Plant Germplasm Resources. Int J Pi Genomic 2008;
doi:10.1155/2008/574927.
142. Durstewitz G, Polley A, Plieske J, etal. SNP discovery by
amplicon sequencing and multiplex SNP genotyping in the
allopolyploid species Brassica napus. Genome 2010;53:
948-56.
143. Zhao K, Wright M, Kimball J, etal. Genomic diversity and
introgression in O. sativa reveal the impact of domestication
and breeding on the rice genome. PLoS ONE 2010;5:
el0780.
144. Manolio TA, Collins FS, Cox NJ, etal. Finding the missing
heritability of complex diseases. Nature 2009;461:747-53.
145. Laird PW. Principles and challenges of genome-wide DNA
methylation analysis. Nat Rev Genet 2010;11:191-203.
146. Alkan C, Coe BP, Eichler EE. Genome structural variation
discovery and genotyping. Nat Rev Genet 2011;12:363-76.
147. Cooper GM, ShendureJ. Needles in stacks of needles: find-
ing disease-causal variants in a wealth of genomic data.
Nat Rev Genet 2011;12:628-40.
50
Kilian and Graner
148. Luo L, Boerwinkle E, Xiong M. Association studies
for next-generation sequencing. Genome Res 2011;21:
1099-108.
149. Swanson- Wagner RA, Eichten SR, Kumari S, et al.
Pervasive gene content variation and copy number variation
in maize and its undomesticated progenitor. Genome Res
2010;20:1689-99.
150. Bergelson J, Roux F. Towards identifying genes underlying
ecologically relevant traits in Arabidopsis thaliana. Nat Rev
Genet 2010;11:867-79.
151. Austin RS, Vidaurre D, Stamatiou G, etal. Next-generation
mapping of Arabidopsis genes. Plant J 2011;67:715-25.
152.Schneeberger K, Weigel D. Fast-forward genetics enabled
by new sequencing technologies. TrendPlSci 2011;16:282-8.
153. Delker C, Quint M. Expression level polymorphisms: her-
itable traits shaping natural variation. Trend PI Sci 2011;16:
481-8.
154. Rakyan VK, Down TA, Balding DJ, etal. Epigenome-wide
association studies for common human diseases. Nat Rev
Genet 2011;12:529-41.
155.Schmitz RJ, Zhang X. High- throughput approaches for
plant epigenomic studies. Curt Opin Pi Biol 2011;14:130-6.
156. Khush GS. Green revolution: the way forward. Nat Rev
Genet 2001;2:815-22.
157. Varshney RK, Bansal KC, Aggarwal PK, etal. Agricultural
biotechnology for crop improvement in a variable climate:
hope or hype? TrendPlSci 2011;16:363-71.
158. Allendorf FW, Hohenlohe PA, Luikart G. Genomics and
the future of conservation genetics. Nat Rev Genet 2010;11:
697-709.
159. Angelow A, Schmidt M, Weitmann K, etal. Methods and
implementation of a central biosample and data manage-
ment in a three-centre clinical study. Comput Methods
Programs Biomed 2008;91:82-90.
160. Wan E, Akana M, Pons JCJ, etal. Green technologies for
room temperature nucleic acid storage. Curr Issues Mol Biol
2010;12:135-42.
161. Peplies J, Fraterman A, Scott R, etal. Quality management
for the collection of biological samples in multicentre stu-
dies. Eur J Epidemiol 2010;25:607-17.
162. Montes JM, Melchinger AE, Reif JC. Novel throughput
phenotyping platforms in plant genetic studies [abstract].
Trend PI Sci 2007;12:433-6.
163. Zhu J, Ingram PA, Benfey PN, etal. From lab to field, new
approaches to phenotyping root system architecture. Curr
Opin PI Biol 2011;14:310-7.
164. Gottwald S, Bauer P, Komatsuda T, etal. TILLING in the
two-rowed barley cultivar 'Barke' reveals preferred sites of
functional diversity in the gene HvHoxl . BMC Res Notes
2009;17:258.
165. Hein I, Kumlehn J, Waugh R. Functional validation in the
Triticeae. In: Feuillet C, Muehlbauer GJ (eds). Genetics and
Genomics of theTriticeae. Plant Genetics and Genomics: Crops and
Models, Vol. 7. Springer: Science+Business Media, LLC,
New York, 2009:359-85.
166. Huson DH, Bryant D. Application of phylogenetic net-
works in evolutionary studies. Mol Biol Evol 2006;23:
254-67.